Automatic human utility evaluation of ASR systems: does WER really predict performance?

نویسندگان

  • Benoît Favre
  • Kyla Cheung
  • Siavash Kazemian
  • Adam Lee
  • Yang Liu
  • Cosmin Munteanu
  • Ani Nenkova
  • Dennis Ochei
  • Gerald Penn
  • Stephen Tratz
  • Clare R. Voss
  • Frauke Zeller
چکیده

We propose an alternative evaluation metric to Word Error Rate (WER) for the decision audit task of meeting recordings, which exemplifies how to evaluate speech recognition within a legitimate application context. Using machine learning on an initial seed of human-subject experimental data, our alternative metric handily outperforms WER, which correlates very poorly with human subjects’ success in finding decisions given ASR transcripts with a range of WERs.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

End-to-End Evaluation of a Speech-to-Speech Translation System in TC-STAR

The paper describes an evaluation methodology to evaluate speech-to-speech translation systems and their results. The evaluation scheme uses questionnaires filled in by human judges for addressing the adequacy and fluency of audio translation outputs and was applied in the second TC-STAR evaluation campaign. The same evaluation methodology is carried out both on the outputs of an automatic syst...

متن کامل

Using Audio Quality to Predict Word Error Rate in an Automatic Speech Recognition System

Faced with a backlog of audio recordings, users of automatic speech recognition (ASR) systems would benefit from the ability to predict which files would result in useful output transcripts in order to prioritize processing resources. ASR systems used in non-research environments typically run in “real time”. In other words, one hour of speech requires one hour of processing. These systems prod...

متن کامل

How to evaluate ASR output for named entity recognition?

The standard metric to evaluate automatic speech recognition (ASR) systems is the word error rate (WER). WER has proven very useful in stand-alone ASR systems. Nowadays, these systems are often embedded in complex natural language processing systems to perform tasks like speech translation, manmachine dialogue, or information retrieval from speech. This exacerbates the need for the speech proce...

متن کامل

Towards spoken clinical-question answering: evaluating and adapting automatic speech-recognition systems for spoken clinical questions

OBJECTIVE To evaluate existing automatic speech-recognition (ASR) systems to measure their performance in interpreting spoken clinical questions and to adapt one ASR system to improve its performance on this task. DESIGN AND MEASUREMENTS The authors evaluated two well-known ASR systems on spoken clinical questions: Nuance Dragon (both generic and medical versions: Nuance Gen and Nuance Med) a...

متن کامل

On the Use of Information Retrieval Measures for Speech Recognition Evaluation

This paper discusses the evaluation of automatic speech recognition (ASR) systems developed for practical applications, suggesting a set of criteria for application-oriented performance measures. The commonly used word error rate (WER), which poses ASR evaluation as a string editing process, is shown to have a number of limitations with respect to these criteria, motivating alternative or addit...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013